Learning Urban Community Structures: A Collective 
Embedding Perspective with Periodic Spatial-temporal 
Mobility Graphs 

PENGYANG WANG, Missouri University of Science and Technology, MO, USA 

YANJIE FU, Missouri University of Science and Technology, MO, USA 

JI AW El ZHANG, Florida State University, FL, USA 

XIAOLIN LI, Nanjing University, Nanjing, China 

DAN LIN, Missouri University of Science and Technology, MO, USA 


Learning urban community structures refers to the efforts of quantifying, summarizing, and representing an 
urban community’s (i) static structures, e.g., Point-Of-Interests (POIs) buildings and corresponding geographic 
allocations, and (ii) dynamic structures, e.g., human mobility patterns among POIs. By learning the community 
structures, we can better quantitatively represent urban communities and understand their evolutions in the 
development of cities. This can help us boost commercial activities, enhance public security, foster social 
interactions, and, ultimately, yield livable, sustainable and viable environments. However, due to the complex 
nature of urban systems, it is traditionally challenging to learn the structures of urban communities. To 
address this problem, in this paper, we propose a collective embedding framework to learn the community 
structure from multiple periodic spatial-temporal graphs of human mobility. Specifically, we first exploit a 
probabilistic propagation-based approach to create a set of mobility graphs from periodic human mobility 
records. In these mobility graphs, the static POIs are regarded as vertexes, the dynamic mobility connectivities 
between POI pairs are regarded as edges, and the edge weights periodically evolve over time. A collective 
deep auto-encoder method is then developed to collaboratively learn the embeddings of POIs from multiple 
spatial-temporal mobility graphs. In addition, we develop a UGWA method (Unsupervised Graph based 
Weighted Aggregation), in order to align and aggregate the POI embeddings into the representation of the 
community structures. We apply the proposed embedding framework to two applications (i.e., spotting vibrant 
communities and predicting housing price return rates) to evaluate the performance of our proposed method. 
Extensive experimental results on real-world urban communities and human mobility data demonstrate the 
effectiveness of the proposed collective embedding framework. 
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1 INTRODUCTION 

Learning urban community structures refers to the efforts of quantifying, summarizing, and repre¬ 
senting a community’s (i) static geographic structure, e.g., important Points-Of-Interests (POIs) 
and corresponding spatial allocations, and (ii) dynamic mobility structure, e.g., human mobility 
patterns among the important POIs. To be specific, the static geographic structure of a community 
refers to the spatial allocations and relative distances of the important POIs that provide a variety 
of urban functions [56] for the community. The dynamic mobility structure describes the strengths 
and dynamics of human mobility connectivity among these important POIs [61]. By learning the 
representation of community structures, we can better understand the evolution of urban commu¬ 
nities over time. The knowledge and patterns obtained by analyzing urban community structures 
can be further used to help us find better solutions to boost commercial activities, enhance public 
security, foster social interactions, which will lead to livable, sustainable and viable environments. 

All the above pieces of evidence suggest that it is highly appealing to study how to quantify 
and discover urban community structures. Indeed, the emerging methodological studies on rep¬ 
resentation learning provide a great opportunity to address this problem. Inspired by the idea 
of representation learning, we propose to formulate the problem of learning urban community 
structures as a spatial representation learning task. Along this line, we develop a collective embed¬ 
ding analytic framework to learn urban community structures. The proposed collective embedding 
framework can unify both static POIs data and dynamic human mobility data as periodic spatiotem- 
poral mobility graphs, and collaboratively learn the embeddings of community structures from the 
spatial-temporal autocorrelations among multiple mobility graphs. 

However, due to the complex nature of urban systems, urban community structure learning is 
not an easy task. Three unique challenges arise in achieving this goal: 

• Graph construction: how to unify and represent the POIs and human periodic mobility 
records as a set of mobility graphs; 

• Collective embedding: how to collectively learn the embeddings of POIs from multiple 
periodic mobility graphs; 

• Embedding aggregation: how to align and aggregate POI embeddings for community 
structure representation learning. 

In what follows, we outline how we tackle these challenges. 

First, to carry out daily activities, people in urban areas often leave from one POI, visit another POI, 
and thus, interact with communities. As a result, human movements create a dynamic association, 
which varies over time, between each POI pair in a community. Graphs are an effective tool to 
represent such a kind of structural information where we can regard POIs as vertexes, and treat 
mobility connectivities between POI pairs as edges. Then, given human periodic mobility records, 
we can construct a set of periodic spatial-temporal mobility graphs to capture the dynamics of a 
community. Unfortunately, since most of human mobility data are GPS data recorded by taxies, city 
bikes, subways and buses, the pick-up and drop-off points are close to but not the exact origins and 
destinations of the travelers. For example, a person may take a short walk from his/her home to 
the bus stop while the dataset will only record the origin starting from the bus stop instead of the 
home. To address this problem, we propose a probabilistic spatial propagation method to estimate 
mobility volumes between POI pairs. 

Second, after representing human mobility records as graphs, an intuitive method is to exploit 
deep auto-encoder [6] to learn the embeddings of graph nodes (i.e., POIs). However, in this study, a 
community is represented by multiple periodic mobility graphs, which highly necessitates a new 
embedding method to simultaneously learn the embeddings of nodes (POIs) from multiple graphs 
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in a collective fashion. Therefore, we develop a collective deep auto-encoder method that can take 
multiple graphs as inputs. 

Third, after obtaining embeddings of each POI in the community, there is still a critical need 
to devise an effective fusion method to align and aggregate all the individual POI embeddings 
into the embedding of the community. We proceed with a two-step strategy. First, we use an 
unsupervised graph-based weighting method to compute the weight of each latent feature in the 
POI embeddings, and then combine the weights to aggregate the embeddings of individual POIs 
to the embeddings of POI categories (e.g., education, shopping, restaurants, entertainment) that 
are semantically aligned across communities. Later, we further aggregate the embeddings of POI 
categories into the embeddings of the community. 

To summarize, in this paper, we propose a collective embedding framework to learn the com¬ 
munity structure from the periodic spatiotemporal graphs of human mobility. Specifically, the 
followings are our four main contributions: (1) We start with a probabilistic propagation approach 
to construct a set of periodic mobility graphs to represent human periodic mobility records. (2) We 
propose a collective deep auto-encoder method to collaboratively learn the embeddings of POIs 
from multiple spatial-temporal mobility graphs. (3) Given the learned POI embeddings, we develop 
an unsupervised graph based weighted aggregation approach to effectively align and aggregate the 
POI embeddings with the representation of the community structures. (4) We apply the proposed 
embedding framework to spot vibrant communities (i.e., urban vibrancy for short) and predict 
housing return rates (i.e,. willingness to pay for short), and the extensive experimental results 
on real-world urban community and human mobility data demonstrate the effectiveness of our 
approach. 

2 PROBLEM STATEMENT 

In this section, we first introduce some important definitions and then formalize the community 
learning problem. 



Fig. 1. This is a sample urban community, where the center is a residential complex. The drop pins 
surrounding the center are POIs located within one kilometer to the residential complex. The different colors 
of the drop pins represent different POI categories, such as living services, educations, finance, shopping, 

restaurants. 
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Definition 2.1. (Urban Community) A community consists of (i) a location (i.e., latitude and 
longitude) of a residential complex, and (ii) a neighborhood area (e.g., a circle with radius of 1 km). 

Usually, in urban areas, a residential complex consists of multiple apartment buildings, where 
each apartment building has many apartments. In addition, there are many POIs located in the 
neighborhood area that provide a variety of urban functions and living services to residents in the 
community. The residents in this community can access these urban facilities and services within a 
walking distance. Figure 1 shows a sample urban community. 

Community detection is a hot topic in social network. There are some social community detection 
algorithms, such as hierarchical clustering [25], spectral clustering [13], divisive algorithm [46, 48], 
and modularity-based methods [15, 22, 23]. In the scenario of urban computing, there are some 
studies that aim to detect urban regions using POI data and human mobility data [12, 61]. In our 
work, we focus on learning the representation of urban communities. To simplify the spatial data 
preprocessing, we define an urban community as a circle neighborhood area with a residential 
complex in the center of the circle area. In the future, we can explore the methods of social 
community detection with the representation learning framework proposed in our paper. 

Definition 2.2. (Mobility Graph) The mobility graph of a community is a graph extracted from 
the POIs data and human mobility data of the community. In this graph, POIs are regarded as nodes, 
and the weights of edges are the human mobility connectivities between two POIs. 

In the methodology section, we introduce a probabilistic propagation based method to compute 
the human mobility connectivity between two POIs. Figure 2 shows an example of a mobility graph. 


Definition 2.3. (Periodic Mobility Graphs) Periodic mobility graphs describe the movements 
of residents in a community throughout a period of time, which is the aggregation of the daily 
mobility graphs. 

The movements of residents in a community are dynamic and always vary over time. Specifically, 
human movements are usually periodic [36, 37]. For instance, local residents mostly go to work 
in the morning and get back home in the afternoon during weekdays. To describe such periodic 
dynamics of human mobility in a community, we propose to extract the periodic mobility graphs 
of a community at a daily granularity. In our experiments, we extracted seven periodic mobility 
graphs (from Monday to Sunday) for each community. In this way, we can learn the representation 
of a community not only from the structure of such mobility graphs, but also the periodic dynamics 
of such mobility graphs. An example of periodic mobility graphs is shown in Figure 3. 

Definition 2.4. (Community Embedding) The embedding of a community is a vector repre¬ 
sentation of the community. The vector representation describes two types of information about 
the community: (i) static spatial configuration, i.e., POIs; (ii) dynamics of human mobility, i.e., the 
evolving structures of mobility graphs. 

It is very important to develop a representation learning model that can take periodic mobility 
graphs as inputs, and output the vector representations of communities. 

Definition 2.5. (Problem Formulation-Learning Urban Community Structure) By consid¬ 
ering the existence of a candidate urban community, in which there are a set of POIs and GPS 
trajectories of human mobility, we wish to learn the vector representation of the community, 
such that the learned vector representation can describe not only static spatial configurations, 
such as POIs and corresponding geographical allocations, but also the dynamic human mobility 
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connectivity of POIs in the community. We formulate this problem as a task of spatial represen¬ 
tation learning. Formally, given a set of spatial graphs G^) = {Q^\ G^\ ..., Q^} that describe 
both POIs and human mobility connectivity between each POI pair for a community c, the spatial 
representation learning problem aims at learning a mapping function /(c^) : G^) —> R d that can 
map the structural information of multiple mobility graphs into a vector representation for the 
community c^. Essentially, there are three major steps: (1) Construct the periodic mobility graph 
set for a community; (2) Collectively learn the POI embedding from multiple mobility graphs; (3) 
Aggregate and align POI embedding into community embedding. 

3 METHODOLOGY 

We first present an overview of our proposed framework and then detail the three critical steps: 
(i) constructing periodic mobility graphs, (ii) collective POI embedding, and (iii) aligning and 
aggregating POI embeddings into community embeddings. 

3.1 Framework Overview 

The focus of this paper is to develop a representation learning framework of community structure 
that can captures dynamic changes of community structure, due to the human mobility. Figure 4 
shows our proposed framework which consists of three main steps: (i) periodic mobility graph 


POI 2 



Fig. 2. This is a mobility graph that consists of six POIs. In this graph, each POI is regarded as a node in the 
mobility graph. The human mobility connectivity between two POIs is regarded as the weight of an edge. 
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Fig. 3. This is an example of seven periodic mobility graphs, each of which represents the mobility 
connectivity of the POI graphs on Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday, 

respectively 


construction, (ii) collective POI embeddings, and (iii) aligning and aggregating POI embeddings 
into community embeddings. Specifically, in the first step, we construct seven periodic mobility 
graphs, where vertexes are POIs and edges represent human movement between POIs. Secondly, 
we propose a collective deep auto-encoder to learn POI embeddings from the periodic mobility 
graphs of each community. Finally, we exploit a graph based unsupervised weighted aggregation 
method to semantically align and aggregate POI embeddings into the embeddings of POI categories. 
Then, we further aggregate POI-category embeddings into community embeddings. 

3.2 Periodic Mobility Graph Construction 

According to Definition 2.3, we aim to learn the representation of the structure of an urban commu¬ 
nity from mobility graphs that describe seven days of human movements among POIs. To construct 
periodic mobiity graphs, the key challenge is how to extract the connectivity measurements between 
the POIs of a community from the large scale human movement data. 

Intuitively, people’s outdoor activities include the transitions from one POI to another POI, 
and, ultimately, form massive mobility flows in a community. As a result, human mobility can 
indicate the connectivity among POIs. Therefore, we estimate the possibility of mobile users that 
move from one POI to another POI to quantify the mobility connectivity between two POIs. This 
step is important because it enables us to measure the human mobility connectivity between two 


ACM Transactions on Intelligent Systems and Technology, Vol. 1, No. 1, Article 1. Publication date: April 2018. 























Learning Urban Community Structures: A Collective Embedding Perspective with Periodic 
Spatial-temporal Mobility Graphs 


1:7 





. 

Monday 

Tuesday Wednesday 



\l%// 

Friday 

Saturday 

Thursday 


/B. 

\u/ 



Sunday 




Collective POI Embedding 


POI Embeddings 


Fig. 4. The overview of the proposed analytical framework. 

POIs. In this way, we can construct mobility graphs over different days to represent an urban 
community. The extracted graphs will be fed into the collecitve embedding model in order to 
learn the representation of urban communities. A straightfoward method for mobility connectivity 
estimation between two POIs, e.g., POI(O) and POI(D), is to directly count the total number of 
visits from POI(O) for POI(D). However, such simple method highly depends on the availability of 
high-quality mobility data, in which each trip must include an origin POI and a destination POI. In 
reality, human mobility data are collected from different devices and sources (e.g., smartphones, 
cellular stations, GPS-equipped vehicles, location based services). Therefore, not every trip in the 
human mobility data includes an accurate origin POI and an accurate destination POI. For instance, 
a taxi ride usually includes an origin GPS point and a destination GPS point. But, the original GPS 
point is not the origin POI; the destination GPS point is not the destination GPS point as well. Thus, 
in order to develop a generalized estimation method that fits various trajectory data and does not 
require exact origin POIs and destination POIs, we propose to exploit a probability propagation 
based method in [19] and propose the following three-step algorithm. 

• Step 1: Propagate visit probability. Given the drop-off point d of a taxi trace, we model the 
probability of a POI p visited by a passenger as a parametric function, whose input x is the 
distance between the drop-off point d and the destination POI p: 

P(x) = y 2 ' x ~ ex P ( 1 - (!) 

where and p 2 are two given hyper-parameters that control the shape of the function P(x). 
Figure 5 shows the function graph of P(x) with the maximum visiting probability, fix = 0.8, 
and the most comfortable walking distance between the drop-off point and the destination 
POI, (3 2 = 100. We adopt such function to estimate visiting probability from drop-off points 
to destination POIs, because the function has several mathematical properties. 
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Distance to desination (m) 

Fig. 5. Probability distribution w.r.t Pi = 0.8, ^2 = 100. 


- Pi - maxP(x): Pi is the maximum value of P(x), and thus pi can be explained as how 

X 

likely a mobile user will visit the destination POI starting from the drop-off point under 
the function P(x). This mathematical property can allow us to easily and empirically set 
the maximum probability of visiting the destination POI from the drop-off point. 

- P 2 = arg maxP(x): p 2 is the value of x when P(x) = Pi. Since x is the distance between the 

X 

drop-off point and the destination POI, p 2 can be explain as the most comfortable walking 
distance between the drop-off point and the destination POI for taxi passengers. 

- When x = 0, P(x) = 0: Since a taxi may not send passengers into a POI building directly, 
the drop-off point is usually not the destination POI. A passenger often walks a short 
distance to reach the destination. 

- When x > p 2 , the value P(x) keep dropping and shows an exponential heavy tail effect. 
In the problem of visiting probability estimation, the drop-off point is usually close to 
the destination POI. It is impossible for a mobile user to request a taxi driver to drop off 
him/her at a place that is very far away from the destination. Hence, when the distance 
exceeds the most comfortable walking distance p 2 , the probability keeps decreasing. 

- When x < p 2 , the value P(x) keep increasing with distance increasing. The intuition behind 
this is that the drop-off point is usually not the same as the destination, and there will 
be a short distance between the drop-off point and the destination. Subjected to the road 
network, when the destinations locate in the neighboring buildings or plazas, the drop-off 
points are usually the same, like at the intersections or beside the pavement near to of 
buildings. For example, in one commercial circle, restaurants and movie theaters are usually 
be closed to each other. When it is the dinner time, even if the welcoming theater is closer 
to the drop-off point than the restaurant, the passengers will be more likely to visit the 
restaurant instead of the movie theater. Therefore, the rule of “the closer, the more likely to 
visit” is not applicable to the scenario of the visiting probability. However, if we apply the 
exponential decay function directly, the visiting probability will decrease with the distance 
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increasing. This means that the visiting probability to the closest POI is the highest. It is 
inconsistent with the fact. 

With this function, we can propagate the visit probability of a passenger from the drop-off 
point to its surrounding POIs. 

• Step 2: Calculate POI visit probability. To evaluate the probability of the z'-th POI p t visited 
by users, we need to aggregate all probabilities from all drop-off points in taxi traces: r(p/) = 
YjdeD P(dis(d,pi)), where D is the drop-off point set of taxi traces in the community. 

• Step 3: Calculate mobility connectivity between POI pairs. We multiply the r(p/) with r(pj) 
to describe the possibility of users visiting pj from p t . This possibility is used to quantify the 
mobility connectivity between pj andp*. The calculation can be formularised as: 



r{pi)-r{pj), if i + j 
0, if i = j 


( 2 ) 


Given a community c^, we segment POI data and human mobility data into seven parts based 
on days in a week. Then, we apply the three-step mobility graph construction method over the 
seven parts of data. After that, we can obtain a graph set G^) of seven mobility connectivity graphs 
across POIs, where G^) = {Q^\ • ■ • , • • • , denotes the mobility connectivity 

graph across POIs of the community c^ on the t-th day of a week. 

3.3 Collective POI Embedding 

Since POIs are links between communities and people, mobility based POI-level features can reveal 
more patterns about the structure of the community. Along this line, we propose a collective POI 
embedding method over the periodic mobility graphs, based on auto-encoder. 

Before introducing the details of our proposed collective POI embedding method, we first give a 
brief review of the traditional auto-encoder model. Auto-encoder is an unsupervised neural network 
model, which projects the instances (in original feature representations) into a lower-dimensional 
feature space via a series of non-linear mappings. However, the traditional auto-encoder can only 
take one input in each training iteration. However, since we have constructed periodic mobility 
graphs to capture the spatiotemporal dynamics of the community structure, we need a collective 
learning method to learn the embeddings of community structure from the inter-correlations of 
multiple mobility graphs. 

To solve the problem, we propose a collective POI embedding method. Formally, for a given 
community cthe z-th row of the constructed periodic graph is used to represent the z'-th 
POI on the day t of a week. Then, given a POI p^, we have seven vectors {p-^}*, where 
t = 1,2,..., 7. We utilize these seven vectors as inputs. Meanwhile, we denote the embeddings of 
the POI pi on the day t as {y^’\ y^’ 2 ,..., y^’°}, at hidden layers 1,2, • • • , o in the encoding step 
respectively. The encoding result of pi in the targeted lower-dimensional feature space can be 
represented as z| k ^ e R N . To handle the problem of multiple inputs, we add embeddings ensembling 
process on the o + 1 layer before generating z| k \ as shown in Figure 6. Then, the encoding step can 




(3) 


where Ws and bs denote the weight terms and bias terms respectively. Especially, W^ denotes 
the weight term and denotes the bias term for p^ at the embeddings ensembling process. 
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In the decoding step, the input will be the embedding (the output of the encoding step), 
and the final output will be the reconstructed vector p^. First, we dispatch zf^ into seven latent 
vectors for each day. Then, the reconstructed embeddings at each hidden layer can be represented 
as y^’°, y-^’° \ ■ ■ ■ , y-^’ 1 . The relationship among these vector variables can be denoted as 


(Ak),o +1 

i,t 

Ak),r-1 

y ut 

»(*:) 

pL 


= o-(WW’ 0+2 z[ fc) +b<^’° +2 ), 

= ^(w ( ; 


(A:), o+lo+l j^(/:),o+l 


=°w'i 


Tt ■ 

> 


= a(W { ^ r y^- r + b£ ) ’ r ),Vr e {2,3, 


,o}, 


n? + ht ). 


(4) 


where Ws and bs denote the weight terms and bias terms respectively Especially, denotes 
the weight term and b^ denotes the bias term for p^ at the embeddings dispatching process. 

For the loss function, to tackle the sparsity problem (there are 0 for both andp^), we assign 
a larger weight for the loss introduced by the non-zero features. Then, we aggregate the loss of 
each day t to obtain the final loss function: 

-c (k) = X Z ll( Pu-Pu )Gv ull2> (5) 

te{ 1,2,...,7} i 

where v^ is the weight vector corresponding to the input p^. The entries in v^ corresponding 
to non-zero elements are set to a value A (A > 1 denotes a larger weight to fit these features); the 
rest entries in v^ are set to 1. 
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3.4 Aligning and Aggregating POI Embeddings to Community Embeddings 

With the collective deep auto-encoder, we can obtain the embeddings (vector representations) of 
POIs. However, we aim to extract the embeddings of urban communities. Since an urban community 
has many POIs, our objective is to weight, align, and aggregate the POI embeddings into the vector 
representations of urban communities. 

To achieve this goal, we proceed with two steps: (1) aggregating POI embeddings to POI-category 
embeddings, and (2) aggregating POI-category embeddings into community embeddings. 

(1) Aggregating POI embeddings to POI-category embeddings : Be sure to notice that different urban 
communities might have different numbers of POIs, and, thus, the sizes of the mobility graphs 
vary over communities. It therefore is very challenging to semantically align and aggregate the 
embeddings of individual POIs into the embeddings of a community. Unlike indidividual POIs, POI 
categories can be semantically aligned. More importantly, different from the number POIs in urban 
communities, the number of POI categories is fixed in every urban community. Along this line, we 
propose to aggregate POI embeddings into the embeddings of POI categories for each community. 
Intuitively, given a POI category, for example, education, we can sum up the embeddings of all the 
education POIs into the POI-category embedding of education. However, such simple summation 
ignores the fact that the latent features in a POI embedding indeed have different importances and 
weights. To quantify the importance of each latent feature in the POI embeddings, we develop an 
unsupervised graph based weighted aggregation (UGWA) approach. 

Traditionally, if the learned latent representations of urban communities are to be applied to a 
specific application problem, e.g., crime rate prediction, a straightforward method for estimating 
feature weights is to directly calculate the statistical relevance between the latent features of the 
learned embedding vectors and the crime rates. However, this idea is not generalized and highly 
depends on the availablitiy of the target values in prediction tasks. Our proposed unsupervised 
weighting method is important because the unsupervised fashion is designed to ensure that the 
learning of feature weights does not depend on the availability of the target values to be predicted, 
and thus is independent from the application problems to be applied. 

Formally, given a community c^, let be the embedding vectors of all the POIs in the 
community c^. e R. MxN , where M is the number of POIs and N is the number of latent 
features in the POI embedding vectors, each row of is the embedding (latent feature vector) of 
a POI, each column of is a latent feature (i.e., a dimension of the latent feature space) in POI 
embeddings. 

The general idea of our proposed graph based weighting method is: We firstly create a POI 
similarity graph, where a vertex is a POI, and the weight of an edge is the smilarity of the two 
embedding vectors of two corresponding POIs. Then, we create an importance measurement to 
quantify the importantance of a latent feature in the embedding feature space by exploiting such 
POI similarity graph. Specifically, for two POIs that are highly smilar (a high-weight edge in the 
graph), if the targeted latent feature is important, the two values of the latent feature in the two POI 
embeding vectors should be consistently similar, and we will increase its importance. Otherwise, we 
will penalize its importance measurement. Likewise, for two POIs that are not similar (a low-weight 
edge in the graph), if the targeted latent feature is important, the two values of the latent feature in 
the two POI embeding vectors should be different. 

Given the l -th feature in the POI embedding vectors, we evaluate its importance based on the 
following steps: 
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• Firstly, for every POI pair, we calculate the similarity between two POIs based on their 
embedding vectors using the cosine similarity, which is given by: 


sirtiij = 




( 6 ) 


Based on the above, we can build a POI similarity graph. 

• Secondly, we calculate the weight of /-th dimension of the feature by examining every edge 
(each POI pair) of the POI similarity graph. The weight is given by 


Xk) = Ziec k Zjec k Sirriij X \g^[i, l] - £<*>[/, /]| 

1 M 


(7) 


The intuition behind Equation 6 and Equation 7 is: if the /-th dimension of the latent feature 
makes more sense, when POI pi and pj are very similar, the difference of pi and pj on the /-th 
dimension (| Q^\i, /] - Q^[j, /] |) should be very small. Therefore, if the /-th dimension of the 
latent feature does not make much sense, | g[i, /] - g\j, /]| will increase; if pi and pj are very similar, 
sirriij will further penalize | g[i, l] - g[j , /] |. 

By using this method, we can obtain the latent feature weight set } = {wj , , • • • , , • • • , 

& (k) [s,l}= Yj ^[Ulxw®, (8) 

Pi^s 

where is the POI-category embedding graph for the community c^, and 4> s is the s-th POI 
category. 

(2) POI alignment : Given a community c^, we align each row of into a vector: G® = 

[s, *]) T , where G® is the aligned community embedding which is 
also the output of the proposed representation learning framework. 


4 APPLICATIONS 

To evaluate and interpret the embeddings of residential communities, we apply our proposed em¬ 
bedding framework to two applications: (1) predicting willing to pay (WTP) for urban communities, 
and (2) spotting vibrant urban communities. 

4.1 Predicting Willing to Pay (WTP) 

Empirical studies have shown that the WTP for communities can be reflected by the return rates of 
real estate prices over a market period, i.e., rising or falling markets [14, 21, 44]. Therefore, given a 
market period, WTP can be measured by the ratio of the price increase relative to the starting price 
of a market period, i.e., r - J p , where Pf and P t denote the final and initial prices, respectively. 

In this application, we first learn and extract the representation features of urban communities 
using the proposed collective embedding method. Then, we calculate the benchmark WTP values 
for each community. Finally, we utilize linear regression to predict the WTP for each community. 

4.2 Spotting vibrant urban communities 

We aim to spot vibrant urban communities. Intuitively, a community can be considered prosperous 
and vibrant if the community can attract a great number of mobile users to visit and consume, or if 
the community can provide a variety of products and services to residents. Therefore, we propose a 
measurement, which we call community vibrancy for simplicity, to measure the combined effect of 
both the density and diversity of consumption check-in activities. Specifically, for the community 
Cjt, we firstly count the number of consumption check-in events as the density of consumption 


ACM Transactions on Intelligent Systems and Technology, Vol. 1, No. 1, Article 1. Publication date: April 2018. 







Learning Urban Community Structures: A Collective Embedding Perspective with Periodic 
Spatial-temporal Mobility Graphs 1:13 


activities, denoted by freq^ k \ In addition, we measure the diversity by calculating the entropy of 
check-in events over different POI categories: div^ = YlJLi f rec t^\ where freqf^ denotes the 
consumption activity amount of the z'-th POI category in the community c^. Finally, we fuse the 
density and diversity using the the harmonic mean to represent the score of community vibrancy: 

_ 2xfreq^xdiv^ 

Uk freq^xdiv^ 

Figure 7a shows that all the communities are sorted in a descending order in terms of the 
computed vibrancy scores. From the curve in Figure 7b, we can identify four inflection points, 
representing the vibrancy scores: 0.9667, 0.9171, 0.8934 and 0.8087, respectively. The four inflection 
points are used to segment the curve into five segments. Accordingly, we can assign five-level 
ratings to each segment as its ranking relevance label, ranging from 0 to 4. We observe that the 
distribution of the community vibrancy scores complies with a power law distribution, which means 
that only a small number of residential communities are highly vibrant while most communities 
are around the mean value of the vibrancy scores. This observation is consistent with our common 
sense that most people are middle-class and only a small group of people are rich. 

Formally, let u^ be the vibrancy score of the community c^, and rank ^ denote the ranking of the 
community based on the vibrancy score u Then, the problem of ranking vibrant communities 
can be formulated as: given the POI and human mobility data, we aim to predict the vibrancy 
ranking rank ^ of the community c^ using the community embeddings learned by the our proposed 
framework. 

5 EXPERIMENTAL RESULTS 

We provide an empirical evaluation of the performances of the proposed method on real-world 
urban community and human mobility data. 

5.1 Data Description 

Table 1 shows the statistics of four data sources used in the experiment. The taxi GPS traces are 
collected from a Beijing taxi company. Each trajectory contains trip ID, distance(m), travel time(s), 
average speed(km/h), pick-up time and drop-off time, pick-up point and drop-off point. We also 
extracted POIs related data from www.dianping.com which is a business review site in China. In 
addition, we obtain the Beijing residential community data by crawling www.soufun.com which 
is the largest real-estate online system in China; and we obtain the check-in data of Beijing by 
crawling www.jiepang.com which is a Chinese version of Fourquare. Each check-in event includes 
POI name, POI category, address, longitude and latitude. 


Table 1. Statistics of the experimental data. 


Data Sources 

Properties 

Statistics 


Number of taxis 

13,597 


Effective days 

92 

Taxi Traces 

Time period 

Apr. - Aug. 2012 

Number of trips 

8,202,012 


Number of GPS points 

111,602 


Total distance(km) 

61,269,029 


Number of residential communities 

2,990 

Residential Communities 

Latitude and Longitude 



Time period of transactions 

04/2011 - 09/2012 


Number of POIs 

328668 

POIs 

Number of POI categories 

Latitude and Longitude 

20 


Number of check-in events 

2,762,128 

Check-Ins 

Number of POI categories 

20 


Time Period 

01/2012-12/2012 
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(a) Analysis of vibrant communities. 



(b) Relevance level of vibrant communities. 
Fig. 7. Analysis of community vibrancy. 


5.2 The Application of WTP Prediction 

5.2 7 Experimental Setup. 

(1) Baselines. 

To evaluate the effectiveness our proposed collective embedding method, we compare six 
feature sets: 

• Explicit Features (EF): Specifically, the explicit features are explicitly defined and extracted 
from the data as follows: (i) POI numbers per category: There are twenty POI categories 
including vehicle service, car dealer, repair & maintenance, motorbike dealer & service, 
food & beverage, shopping, daily life service, sports recreation, medical service, lodging, 
tourist, real estate, government & non-government, organization, culture & education, 
transportation, finance & insurance, company & factory, road furniture, named place & 
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Table 2. The performance comparison on WTP prediction. 


Feature set 

ELF 

LF 

EF 

V-l 

V-2 

V-3 

RMSE 

0.0036 

0.0057 

0.0422 

0.0273 

0.0350 

0.0193 


address, public service; (ii) Average commute distance; (iii) Average commute speed; (iiii) 
Average commute time; (v) Number of mobilities; (vi) Average distance between POIs. 

• Latent Features (LF): Specifically, the latent features are learned from the proposed collective 
embedding method. 

• The combination of EF and LF (ELF). Sppefically, we combine both the explicit features via 
traditional feature extraction and the latent features via representation learning togeter 
into a new feature set. 

• Variation of stepl (V-l). In the first step of the learning framework, we propose a proba¬ 
bilistic way to derive the mobility graphs over POIs for different days. There is a simple 
variation that use distance-based matching of the records in the trajectories with the POIs, 
and build a transition graph deterministically. We modify the first step and keep other 
parts of the learning framework the same. We use this version of framework to generate 
features. 

• Variation of step2 (V-2). In the second step of the learning framework, we propose a 
collective learned method based on Autoencoder. An alternative way is first deriving 
different embeddings using different graphs, and then computing the POI embedding as an 
average of the embeddings. We modify the second step and keep other parts of the learning 
framework the same. We use this version of framework to generate features. 

• Variation of step3 (V-3). In the third step of the learning framework, we propose graph based 
method to aggregate the POI embeddings into community embeddings. An alternative 
way is just averaging over the POI embeddings in the community to derive the community 
embedding. We modify the third step and keep other parts of the learning framework the 
same. We use this version of framework to generate features. 

(2) Evaluation Metrics. 

We utilize the root-mean-square error (RMSE) to evaluate the performance. 

5.2.2 Results and Analysis. 

Table 2 shows the performance comparison of six feature sets in term of RMSE. In all cases, we 
observe that the combination of explicit features and latent features achieves the best performance, 
while the explicit feature has the highest errors. For the three variations of our proposed learning 
framework, the performances are worse than the latent features. This validates the necessity of 
designing such three steps for learning. 

5.3 Spotting vibrant urban communities 

5.3. 7 Performance Comparison with Application Related Methods. 

As mentioned before, we apply the proposed collective embedding method to ranking high-rated 
urban communities as an application. Here, we chooose some application related (ranking) methods 
for comparison, in order to demonstrate the effectivenes of our proposed method. 

(1) Baseline Algorithms. 

To show the effectiveness of the collective embedding framework, we compare the perfor¬ 
mances of different combinations of feature sets and ranking algorithms. 

First, we used five learning methods to rank (LTR) algorithms for comparison: 

• Multiple Additive Regression Trees (MART) [17]: It is a boosted tree model in which the 
output of the model is a linear combination of the outputs of a set of regression trees. 
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MART is a class of boosting algorithms that may be viewed as performing gradient descent 
in function space, using regression trees. 

• RankBoost (RB) [16]: It is a boosted pairwise ranking method, which trains multiple weak 
rankers and combines their outputs as final ranking. The basic idea of RankBoost is to 
formalize learning to rank as a problem of binary classification on instance pairs, and 
then to adopt boosting approach. Like all boosting algorithms, RankBoost trains one weak 
ranker at each round of iteration, and combines these weak rankers as the final ranking 
function. After each round, the document pairs are re-weighted: it decreases the weight of 
correctly ranked pairs and increases the weight of wrongly ranked pairs . 

• LambdaMART (LM) [9]: it is the boosted tree version of LambdaRank, which is based on 
RankNet. LambdaMART combines MART and LambdaRank. 

• ListNet (LN) [10]: It is a listwise ranking model with permutation top-k ranking likelihood 
as objective function, it introduces two probability models, respectively referred to as 
permutation probability and top-k probability, to define a listwise loss function for learning. 
Neural Network and Gradient Descent are then employed as model and algorithm in the 
learning method. 

• RankNet (RN) [8]: it uses a neural network to model the underlying probabilistic cost 
function. 

Besides, we utilize six feature sets mentioned in 5.2.1 Section for comparison. 

Finally, we create 30 combinations of features and rankers for comparisons. We use 
between a feature set and a ranker to denote a combination, for instance, “ELF-MART”. 
We utilize RTree 1 to index geographic items (i.e., taxi and bus trajectories, checkins, etc.) 
and extract the defined features. For these 5 LTR algorithms, we use RankLib 2 . We set the 
number of trees = 500, the number of leaves = 10, the number of threshold candidates = 
256, and the learning rate = 0.1 for MART and LambdaMART both. We set the number of 
iteration = 300, the number of threshold candidates =10 for RankBoost. We set learning 
rate = 0.0005, number of hidden layers = 1, the number of hidden nodes per layer = 10, and 
the number of epochs to train for ListNet and RankNet both. After we generate the data 
pairs { feature, ranking relevance}, we shuffle the data pairs and select 80% for training 
and 20% for testing, where feature refers to EF, LF, or ELF and ranking relevance is based 
on corresponding vibrancy values. 

(2) Evaluation Metrics. 

Normalized Discounted Cumulative Gain (NDCG@N). The discounted cumulative gain 
(DCG@N) is given by DCG[n\ = j ^cGin-i] ^ f eln if n >= 2 where rel n denotes the rank¬ 
ing relevance of the n- th community, defined in Figure 7b. Later, given the ideal discounted 
cumulative gain DCG' , NDCG at the n-th position can be computed as NDCG[n] = • 

The larger NDCG@N is, the higher top-N ranking accuracy is. 

Kendall’s Tau Coefficient. Kdendall’s Tau coefficient (or Tau for short) is a measure of 
rank correlation, i.e., the similarity of the orderings of the data. Let us assume that each 
community i is associated with a benchmark score yt and a predicted score f. Then, for a 
community pair < ij >, < ij > is said to be concordant, if both y t > yj and f > f or if 
both yi < yj and f < f. Also, < ij > is said to be discordant, if both y t < yj and f > f or 


if both y t < yj and f > f. Tau is given by Tau = 



1 https://pypi.python.org/pypi/Rtree/ 

2 http://sourceforge.net/p/lemur/wiki/RankLib/ 
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Table 3. The performance variances of different feature sets over neighborhood profile based community 

subgroups. 


Features 

Variances 

on 

NDCG 

@5 

Variances 

on 

NDCG 

@10 

Variances 

on 

NDCG 

@15 

Variances on 
NDCG 
@20 

Variances 

on 

Fmeasure 

@5 

Variances on 
Fmeasure 
@10 

Variances 

on 

Fmeasure 

@15 

Variances 

on 

Fmeasure 

@20 

Tau 

ELF 

0.0527 

0.0581 

0.0921 

0.0652 

0.0019 

0.0023 

0.0039 

0.0059 

0.0526 

LF 

0.0797 

0.0825 

0.1081 

0.1072 

0.0018 

0.0009 

0.0012 

0.0019 

0.0888 

EF 

0.1609 

0.2116 

0.2754 

0.2422 

0.0022 

0.0023 

0.0039 

0.0059 

0.0987 


Table 4. The performance variances of different feature sets over administrative district based community 

subgroups. 


Features 

Variances 

on 

NDCG 

@5 

Variances on 
NDCG 
@10 

Variances on 
NDCG 
@15 

Variances 

on 

NDCG 

@20 

Variances 

on 

Fmeasure 

@5 

Variances 

on 

Fmeasure 

@10 

Variances on 
Fmeasure 
@15 

Variances 

on 

Fmeasure 

@20 

Tau 

ELF 

0.0871 

0.0867 

0.0392 

0.0963 

0.0019 

0.0022 

0.0039 

0.0059 

0.0285 

LF 

0.0913 

0.1103 

0.0828 

0.1389 

0.0022 

0.0023 

0.0039 

0.0059 

0.0925 

EF 

0.2970 

0.2937 

0.2811 

0.1961 

0.0019 

0.0023 

0.0039 

0.0059 

0.1027 


F-measure@N. F-measure@N incorporates both precision and recall in a single metric 
by taking their harmonic mean: F@N = • Since we use a five-level 

rating system (4>3>2>1>0) instead of binary rating, we treat the rating > 3 as 
“high-vibrancy" and the rating < 3 as “low-vibrancy". Given a top-N community list E^ 
sorted in a descending order of the prediction values, the precision and recall are defined 
as Precision@iV = ^ En and Recall@iV = ^ where E> 3 are the communities 
whose ratings are greater or equal to three. 

(3) Results and Analysis. Figure 8 shows the performance comparison of the fifteen combinations 
of the feature sets and the ranking algorithms in terms of Tau, NDCG@N and F-measure@N. 
In all cases, we observe a significant improvement by considering the learned embeddings 
with respect to baselines. 

First, we control the ranker and investigate the effectiveness of different feature sets. Among 
the five rankers, the combination of explicit features and latent features performs the best. 
Besides, we observe that the latent features outperform the explicit features. In particular, for 
NDCG@N, when N is getting larger, the results clearly demonstrate the superiority of the 
latent features learned by our framework. This observation proves that the latent features 
are discriminative for spotting top vibrant communities. A potential interpretation of this 
observation is that human mobility in dynamic spatiotemporal graphs provide more informa¬ 
tion about community structures than the static geographical locations. When we combine 
the latent and explicit features together, both the dynamic and static structural information 
of a community are combined to provide a more comprehensive and effective representation 
for communities. Therefore, the predictive accuracies are significantly improved. 

5.3.2 Comparison with Representation Learning Algorithms. 

We compare our proposed representation learning framework with other state-of-art representation 
learning algorithms to evaluate the representation learning performance. 

(1) Baseline Algorithms. 

We take three state-of-art representation learning algorithms as baselines, including Skip- 
gram, Restricted Boltzmann Machines (RBMs), and Non-negative Matrix Factorization (NMF). 
• Skip-gram: It is a type of Word2vec models that are used to produce word embeddings. 
Skip-gram uses the current word to predict the surrounding window of context words. 
The skip-gram architecture weighs nearby context words more heavily than more distant 
context words [45]. 


ACM Transactions on Intelligent Systems and Technology, Vol. 1, No. 1, Article 1. Publication date: April 2018. 





























1:18 


Pengyang Wang, Yanjie Fu, Jiawei Zhang, Xiaolin Li, and Dan Lin 



(a) NDCG@N 



(b) Fmeasure@N 



(c) Tau 

Fig. 8. The overall performance comparisons of the fifteen feature and ranker combinations in terms of 

NDCG, F-measure, and Tau. 
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• RBMs: It is a generative stochastic artificial neural network that can learn a probability 
distribution over its set of inputs. RBMs are q two-layer undirected graphical model, which 
can produce distributed representations of the input and perform well in terms of retrieval 
accuracy [2 9]. 

• NMF: It is a technique that learns a low-dimensional representation of a dataset. When 
applying NMF over a matrix, the factorized sub-matrix can be interpreted as the latent 
representation of the original matrix[51]. 

We use these three baselines and our proposed framework to generate representation learn¬ 
ing results. Then, we feed these representation learning results into five LTR algorithms, 
to examine the representation learning performance. After we generate the data pairs 
{feature , ranking relevance], we shuffle the data pairs and select 80% for training and 
20% for testing, where feature refers to EF, LF, or ELF and ranking relevance is based on 
corresponding vibrancy values. 

(2) Evaluation Metric. 

We compare the performance of our proposed framework, Skip-gram, RBMs and NMF, in 
terms of NDCG@N. 

(3) Results and Analysis. 

The experimental results are shown in Figure 9. As we can see in Figure 9, in most cases, our 
proposed model outperforms the skip-gram model. However, for NDCG@5, the skip-gram is 
better than our model in some cases. A potential explanation is that the skip-gram algorithm 
weighs nearby context words more heavily than distant context words. In this way, the 
connectivities of the neighboring POIs in a community play a more important role in the 
skip-gram algorithm for identifying top vibrant communities. However, the auto-encoder is 
still better suited for the spatial graph embedding scenario that the skip-gram method. There 
are two main reasons. 

(a) First, the skip-gram algorithm is originally designed for word embedding, in order to con¬ 
sider the semantics of neighboring words in the sentences. If we regard an urban community 
as a document and regard a POI as a word, it is very difficult to define what is “neighboring 
words in the sentences”. In the spatial scenario, human mobility connectivity is related to 
but not totally determined by geographic distances. No matter whether the definition of 
“neighboring in the sentences” is based on distance or human mobility connectivity, the 
POIs are difficult to be organized as a “semantic” sentence properly. However, if we repre¬ 
sent a mix of POIs, human mobility data, and urban communities into graphs, Autoencoder 
is capable of projecting these graphs into lower dimensional vectors, while reserving the 
relationships between POIs in the embedded vector implicitly. This mechanism does not 
require us to explicitly define what are “neighboring words in the sentences”. 

(b) Second, in this paper, we consider the periodical patterns of the community structure. 
Therefore, we need to collectively model the inputs of different days in a week. Thanks to 
the the nature of the neural network, Autoencoder can be easily modified and improved, 
in order to meet the requirement than the skip-gram model. 

5.4 Robustness Check 

We apply the learned embeddings and the ranking algorithms to different subgroups of the com¬ 
munities, to examine the robustness of our method in these subgroups. We used two grouping 
methods to segment the communities into multiple subgroups: (i) neighborhood profile based 
grouping; (ii) administrative district based grouping. For (i), we applied K-Means [31] to cluster the 
communities into five groups. The communities in each group generally share similar functionality 
and representations. For (ii), we chose four administrative districts in Beijing: Haidian, Chaoyang, 


ACM Transactions on Intelligent Systems and Technology, Vol. 1, No. 1, Article 1. Publication date: April 2018. 


1:20 


Pengyang Wang, Yanjie Fu, Jiawei Zhang, Xiaolin Li, and Dan Lin 



@5 @10 @15 @20 


(a) NDCG@N comparisons over LambdaMART 



(b) NDCG@N comparisons over ListNet 
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(c) NDCG@N comparisons over MART 
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(e) NDCG@N comparisons over RankNet 

Fig. 9. The representation learning comparisons of the Skip-gram, RBMs, NMF and our proposed framework 

in terms of NDCG@N. 

Xicheng, and Dongcheng for grouping, because most of the communities are located the above 
four districts. Later, we conduct the robustness check from two perspectives. 

(i) NDCG@N performance comparison. We intend to answer the following question: Com¬ 
pared to the accuracy of our method in all the communities, will our proposed method be consis¬ 
tently effective in the community subgroups? From Figure 8 we have observed that the “ELF-LN” 
combination performs the best in all the communities. Therefore, we pick “ELF-LN” and examine 
the effectiveness of “ELF-LN” on the community subgroups. Figure 10 indicates that, for neigh¬ 
borhood profile based grouping, “ELF-LN” is consistently accurate in the neighborhood profile 
based community subgroups in terms of NDCG@N. We can obtain similar observations in the 
administrative district based community subgroups. 

(ii) Average variance of performance. For each feature set, we measure the performance 
variance of all the feature-ranker combinations in terms of NDCG, F-measure, and Tau. Table 3 
and Table 4 show: (1) Overall, the variance of the “ELF” feature set is the smallest; (2) The latent 
features in community embeddings (“LF”) achieve the second smallest variance and is much better 
than the explicit extracted features. 

The results validate the robustness of our method in different community subgroups. 


(d) NDCG@N comparisons over RankBoost 
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(a) The NDCG@Ns of ELF-LN over the five neighborhood 
profile based subgroups and all the communities. 
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(b) The NDCG@Ns of ELF-LN over the four administrative 
district based subgroups and and all the communities. 

Fig. 10. The NDCG@Ns of the ELF-LN combination over community subgroups. (N=5, 10, 15, 20) 


5.5 Investigation of Community Structure Properties 

We investigate the structure of urban communities in Beijing in two aspects, (i) community con¬ 
nectivities, and (ii) the learned representation of the community structure. 

5.5.7 Community Connectivities. We utilize traffic flow as the estimation of the community 
connectivities. As shown in Figure 11, we use heat map to visualize the results. The darker the 
color is, the higher the connectivity is. We can observe that communities with high connectivities 
are mainly distributed around the main road network of Beijing, which demonstrates that the 
convenience of transportation utilities contributes a lot to connectivities. 

5.5.2 The Learned Representation of the Community Structure. For simplicity, we choose two 
similar communities with similar POI distributions in different POI categories, as shown in Table 5. 
Then, we utilize heat map to visualize the learned representations of community structures of 
these two communities. In Figure 12, each column represents the corresponding dimension of the 
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Fig. 11. Visualization of the community connectivities. 


Table 5. The POI distributions of two selected communities. 



Restaurant 

Business/Public Agency 

Entertainment 

Transportation/Lodge 

Others 

Community 1 

3572 

7858 

1951 

2136 

311 

Community 2 

2395 

7487 

1968 

2423 

420 


learned representation space, and each row represents the corresponding community. Moreover, 
the color represents the value of the corresponding dimension. The darker the color is, the higher 
the value is. We can observe that for these two communities with similar POI distributions, the 
learned representations of community structures are still very similar. 

6 RELATED WORK 

Representation learning. Our work has connections with representation learning which can be 
categorized into three main approaches: (i) the probabilistic models, (ii) the geometrically motivated 
manifold-learning approaches, and (iii) the reconstruction-based algorithms related to auto-encoder. 

The key idea of the probabilistic model based approaches is to use unsupervised feature learning 
to learn a hierarchy of features one level at a time [6, 28, 35, 47, 50]. For example, Wang et al. used 
a regression learner to learn the optimized layout of heterogeneous elements on the search result 
page (SERP) [53]. The work in [2] used an unsupervised learning method to obtain a hierarchy of 
features one level at a time and to learn a new transformation at each level to be composed with 
the previously learned transformations. 

In the second category, the large majority of the algorithms adopt a non-parametric approach, 
based on a training set nearest neighbor graph [7, 43, 49, 54, 54]. Hinton et al. [27] and Bengio et al. 
[4] exploited the Restricted Boltzmann Machines (RBMs) to perform unsupervised feature learning 
for natural image modeling. The work in [43] introduced “t-SNE” that was built on a geometric 
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Fig. 12. Visualization of the learned structure representations of two similar communities. 


perspective that adopts a non-parametric approach, based on a training set nearest neighbor graph, 
which is a variation of Stochastic Neighbor Embedding [26]. 

As for the auto-encoder based methods, compared to probabilistic models, it does not need 
complicated posterior distributions because of the use of latent variables. Auto-encoders can directly 
parameterize features or representation functions, and learn a direct encoding [3, 5, 30, 34, 58, 65]. 
Therefore, we chose the auto-encoder method as our base model and further develop a collective 
spatiotemporal auto-encoder to learn the representation of community structure. 

Urban computing. Urban computing is a process aiming to tackle major issues in cities by 
analyzing and modeling urban data(e.g., traffic flow, human mobility, and geographical data). One 
of the biggest challenges in urban computing is to compute with heterogeneous data [63] . Zheng 
et al. proved that setting equally weight for different data source in a regression of classification 
model does not achieve the best performance [64]. Yuan et al discovered regional functions of a city 
using POIs and taxi traces [61]. Zhang et al. first detected spatiotemporal hotspots and then from 
geo-tagged social media data and then use both reconstruction and single graph based strategies 
to learn the representations of geo-tagged time-stampped words [62]. Zhang et al. in employed 
an accelerated mode seeking procedure to detect spatial-temporal hot spots underlying people’s 
activities, and jointly embeds all spatial, temporal, and textual units into the same space [62]. 
Compared to the embeddings of geo-tagged time-stampped words, our work targets at a different 
spatial gradularity and aims at learning the representation of an entire urban community. Fu et al. 
proposed a probabilistic latent factor model to learn the portfolios of urban functions in a zone [18] . 
Cici et al. identified emerging patterns with multi-relational approach from spatial data [11]. Wang et 
al. adopted the skip-gram model to learn the region representation from urban and mobile data [52]. 
Different from the skip-gram model on single graph, our work focuses on collectively learning from 
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multiple spatiotemporal graphs. Bejan et al. minied the driving route for end users by considering 
physical feature of a route, traffic flow, and driving behavior [1] . Liu et al detected spatial-temporal 
causality of outliers in traffic data [40]. Liu et al. provided an integrated mobility pattern analysis 
between the location traces of taxicabs and the mobility records in bus transactions [41]. Lan et al. 
introduced a road segment-based anomaly detection problem, which is to detect the abnormal road 
segments each of which has its “real” traffic deviating from its “expected” traffic and to infer the 
major causes of anomalies on the road network [33]. Liu et al. focused on the identification and 
optimization of flawed region pairs with problematic bus routing to improve utilization efficiency 
of public transportation services, according to people’s real demand for public transportation [42]. 
Liu et al. provided a focused study of temporal retweeting patterns and their influence on social 
media marketing campaigns [38]. Yao et al. presented a novel method which incorporates the degree 
of temporal matching between users and POIs into personalized POI recommendations [60]. Fu 
et al. developed a system, named CUMMA, for classifying service usages of mobile messaging 
Apps by jointly modeling user behavioral patterns, network traffic characteristics, and temporal 
dependencies [20]. Liu et al. proposed a bike sharing network optimization approach by considering 
multiple influential factors, to enhance the quality and efficiency of the bike sharing service by 
selecting the right station locations [39]. Yao et al. proposed a Deep Multi-View Spatial-Temporal 
Network (DMVST-Net) framework to model both spatial and temporal relations [59]. 

Learning to rank. Also, our work is related to Learning-to-Rank method, which includes 
pointwise, pairwise, and listwise approaches. The pointwise methods [24] reduce the LTR task to a 
regression problem: given a single query-document pair, predict its score. The pairwise methods 
approximate the LTR task to a classification problem. The goal of the pairwise ranking is to learn 
a binary classifier to identify the better document in a given document pair by minimizing the 
average number of inversions in ranking [8, 16]. The listwise methods optimize a ranking loss 
metric over lists instead of document pairs [55]. For instance, Li et al. proposed AdaRank [57] 
and ListNet [10] and Burges et al. proposed LambdaMART [9]. More recent work in [32] further 
learned the ranking model which is constrained to be with only a few nonzero coefficients using 
LI constraint and propose a learning algorithm from the primal dual perspective. 

7 CONCLUSION REMARKS 

In this paper, we studied the problem of learning the urban community structures. We take into 
account not only the points of interests but also human movement among these POIs. We formulate 
the problem as a learning task over multiple mobility graphs of POIs and propose a novel collective 
embedding framework. The framework consists of three major steps. We started with a probabilistic 
propagation method to unify and represent static POIs and dynamic human mobility records as 
periodic spatial-temp oral mobility graphs. We then developed a collective embedding method to 
learn the embeddings of POIs from the obtained mobility graphs. Based on the POIs embeddings, we 
further proposed an unsupervised graph based weighted aggregation method to identify community 
embeddings. To evaluate the performance of the proposed approach, we applied it to predict WTP 
for communities and spot vibrant communities from real datasets. The experimental results show 
that our approach can effectively learn the representation of community structures and substantially 
enhance the vibrant community prediction accuracy. Finally, it is worth noting that our proposed 
collective framework also has the potential to be generalized to learn the structural representations 
of other geographic items. 
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